This is one page of the R Handbook for Epidemiologists, but is being printed as a stand-alone page.

You can find the complete handbook on Github

GIS basics

Overview

Spatial aspects of your data can provide a lot of insights into the situation of the outbreak to answer questions such as:

  • Where are the current disease hotspots?
  • How the hotspots have changed over time?
  • How is the access to health facility? Any improvements are needed?

In this section, we will explore basic spatial data visualization methods using tmap and ggplot2 packages. We will also walk through some of the basic spatial data management and querying methods with the sf package.

Choropleth map

Density heatmap

Health facility catchment area

Preparation

Load packages
First, load the packages required for this analysis:

Sample case data

Sierra Leone: Admin boundary shapefiles
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-all-ad-min-level-boundaries

Sierra Leone: Population by ADM3
Data downloaded from HDX: https://data.humdata.org/dataset/sierra-leone-population

Sierra Leone: Health facility data from OpenStreetMap
Data downloaded from HDX: https://data.humdata.org/dataset/hotosm_sierra_leone_health_facilities

Plotting coordinates

The easiest way to plot the XY coordinates (points) is to draw a map directly from the sf object which we created in the preparation section.

tmap offers simple mapping capabilities for both static (plot mode) and interactive (view mode) with just a few lines of codes.

This blog provides a good comparison among different mapping options in R. https://rstudio-pubs-static.s3.amazonaws.com/324400_69a673183ba449e9af4011b1eeb456b9.html

polygons and shapefiles

Choropleth maps can be useful to visualize your data by pre-defined area usually by administrative unit or health area for outbreak response to be able to target resources for specific area high incidence rates for example.

The current linelist data does not contain any information about the administrative units. Although it is ideal to store such information during the initial data collection phase, we can also assign administrative units to individual cases based on their spatial relationships (i.e. point intersects with a polygon).

sf package offers various methods for spatial joins. See more documentation about the st_join method and spatial join types here: https://r-spatial.github.io/sf/reference/geos_binary_pred.html

Spatial assign administrative units to cases First spatially intersect our case locations (points) with the ADM3 boundaries (polygons)

Case counts by ADM3

Choropleth mapping Now that we have the administrative unit names assigned to all cases, we can start mapping the case counts by area (choropleth maps).

Since we also have population data by ADM3, we can add this information to the case_adm3 table created previously.

Join this table with the ADM3 polygons for mapping

Mapping the results

Health facility catchment area

It might be useful to know where the health facilities are located in relation to the disease hot spots.

Finding the nearest health facility We can use the st_nearest_feature method from the sf package to assign the cloest health facility to individual cases.

We can see that “Den Clinic” is the closest health facility for about ~30% of the cases.

Visualizing the results on the map

Cases within 30 mins Walking distance from the closest health facility

We can also explore how many cases are located within 2.5km (~30 mins) walking distance from the closest health facility.

Note: For more accurate distance calculations, it is better to re-project your sf object to the respective local map projection system such as UTM (Earth projected onto a planar surface). In this example, for simplicity we will stick to the World Geodetic System (WGS84) Geograhpic coordinate system (Earth represented in a spherical / round surface, therefore the units are in decimal degrees). We will use a general conversion of: 1 decimal degree = ~111km.

See more information about map projections and coordinate systems: https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/

First create a circular buffer with a radius of ~2.5km aroudn each health facility

Intersect this with the cases

Count the results

202 out of 1000 cases (20.2%, shown in red dots in the map below) live more than 30 mins away from the nearest health facility)

Visualize the results